67 research outputs found

    Efficient Learning of Sparse Conditional Random Fields for Supervised Sequence Labelling

    Full text link
    Conditional Random Fields (CRFs) constitute a popular and efficient approach for supervised sequence labelling. CRFs can cope with large description spaces and can integrate some form of structural dependency between labels. In this contribution, we address the issue of efficient feature selection for CRFs based on imposing sparsity through an L1 penalty. We first show how sparsity of the parameter set can be exploited to significantly speed up training and labelling. We then introduce coordinate descent parameter update schemes for CRFs with L1 regularization. We finally provide some empirical comparisons of the proposed approach with state-of-the-art CRF training strategies. In particular, it is shown that the proposed approach is able to take profit of the sparsity to speed up processing and hence potentially handle larger dimensional models

    Deep Learning for Metagenomic Data: using 2D Embeddings and Convolutional Neural Networks

    Full text link
    Deep learning (DL) techniques have had unprecedented success when applied to images, waveforms, and texts to cite a few. In general, when the sample size (N) is much greater than the number of features (d), DL outperforms previous machine learning (ML) techniques, often through the use of convolution neural networks (CNNs). However, in many bioinformatics ML tasks, we encounter the opposite situation where d is greater than N. In these situations, applying DL techniques (such as feed-forward networks) would lead to severe overfitting. Thus, sparse ML techniques (such as LASSO e.g.) usually yield the best results on these tasks. In this paper, we show how to apply CNNs on data which do not have originally an image structure (in particular on metagenomic data). Our first contribution is to show how to map metagenomic data in a meaningful way to 1D or 2D images. Based on this representation, we then apply a CNN, with the aim of predicting various diseases. The proposed approach is applied on six different datasets including in total over 1000 samples from various diseases. This approach could be a promising one for prediction tasks in the bioinformatics field.Comment: Accepted at NIPS 2017 Workshop on Machine Learning for Health (https://ml4health.github.io/2017/); In Proceedings of the NIPS ML4H 2017 Workshop in Long Beach, CA, USA

    SPI-GCN: A Simple Permutation-Invariant Graph Convolutional Network

    Get PDF
    A wide range of machine learning problems involve handling graph-structured data. Existing machine learning approaches for graphs, however, often imply computing expensive graph similarity measures, preprocessing input graphs, or explicitly ordering graph nodes. In this work, we present a novel and simple convolutional neural network architecture for supervised learning on graphs that is provably invariant to node permutation. The proposed architecture operates directly on arbitrary graphs and performs no node sorting. It also uses a simple multi-layer perceptron for prediction as opposed to conventional convolution layers commonly used in other deep learning approaches for graphs. Despite its simplicity, our architecture is competitive with state-of-the-art graph kernels and existing graph neural networks on benchmark graph classification data sets. Our approach clearly outperforms other deep learning algorithms for graphs on multiple multiclass classification tasks. We also evaluate our approach on a real-world original application in materials science, on which we achieve extremely reasonable results

    Handling Expensive Optimization with Large Noise

    Get PDF
    International audienceThis paper exhibits lower and upper bounds on runtimes for expensive noisy optimization problems. Runtimes are expressed in terms of number of fitness evaluations. Fitnesses considered are monotonic transformations of the {\em sphere} function. The analysis focuses on the common case of fitness functions quadratic in the distance to the optimum in the neighborhood of this optimum---it is nonetheless also valid for any monotonic polynomial of degree p>2. Upper bounds are derived via a bandit-based estimation of distribution algorithm that relies on Bernstein races called R-EDA. It is known that the algorithm is consistent even in non-differentiable cases. Here we show that: (i) if the variance of the noise decreases to 0 around the optimum, it can perform optimally for quadratic transformations of the norm to the optimum, (ii) otherwise, it provides a slower convergence rate than the one exhibited empirically by an algorithm called Quadratic Logistic Regression based on surrogate models---although QLR requires a probabilistic prior on the fitness class

    Assessment of Quality Management Systems of Service Companies

    Get PDF
    Decision-making in the formation of quality management systems for compliance with the requirements of the international standard ISO 9001:2015 should be a strategically important area of activity for enterprises in the service sector, and should be based on the use of effective methods, measures, methodology, and other quality management tools. The aim of this article is to study the existing methodological approaches to the evaluation of the quality management systems of enterprises and to develop effective practical tools for their application in the field of engineering services. The existing methodological approaches to the evaluation of the quality management systems of enterprises are considered, and attention is focused on the advantages and disadvantages of each of them. Directions for the estimation of the quality management systems of enterprises in the sphere of engineering services on the basis of requirements of the international standard ISO 9001:2015 are offered. An algorithm for the expert evaluation of the processes of quality management systems of an enterprise in the field of engineering services is developed, and recommendations for its application are provided. The expediency of applying the methodology of the balanced scorecard (BSC) for the evaluation of quality management systems of enterprises in the field of engineering services is also substantiated. A strategic map of an enterprise in the field of engineering services is formed on the basis of a balanced system of indicators for the assessment of quality management systems. A comparative analysis of the costs of the business processes of an enterprise in the field of engineering services before and after the implementation of the quality management system is conducted, alongside the calculation of the economic effect of this implementation

    Continuous Upper Con dence Trees

    Get PDF
    International audienceUpper Con dence Trees are a very e cient tool for solving Markov Decision Processes; originating in di cult games like the game of Go, it is in particular surprisingly e cient in high dimensional problems. It is known that it can be adapted to continuous domains in some cases (in particular continuous action spaces). We here present an extension of Upper Con dence Trees to continuous stochastic problems. We (i) show a deceptive problem on which the classical Upper Con dence Tree approach does not work, even with arbitrarily large computational power and with progressive widening (ii) propose an improvement, termed double-progressive widening, which takes care of the compromise between variance (we want in nitely many simulations for each action/state) and bias (we want su ciently many nodes to avoid a bias by the rst nodes) and which extends the classical progressive widening (iii) discuss its consistency and show experimentally that it performs well on the deceptive problem and on experimental benchmarks. We guess that the double-progressive widening trick can be used for other algorithms as well, as a general tool for ensuring a good bias/variance compromise in search algorithms

    A Principled Method for Exploiting Opening Books

    Get PDF
    International audienceWe used in the past a lot of computational power and human expertise for having a very big dataset of good 9x9 Go games, in order to build an opening book. We improved a lot the algorithm used for gen- erating these games. Unfortunately, the results were not very robust, as (i) opening books are definitely not transitive, making the non-regression testing extremely difficult and (ii) different time settings lead to opposite conclusions, because a good opening for a game with 10s per move on a single core is very different from a good opening for a game with 30s per move on a 32-cores machine (iii) some very bad moves sometimes occur. In this paper, we formalize the optimization of an opening book as a matrix game, compute the Nash equilibrium, and conclude that a naturally randomized opening book provides optimal performance (in the sense of Nash equilibria); surprisingly, from a finite set of opening books, we can choose a distribution on these opening books so that this random solution has a significantly better performance than each of the deterministic opening book

    Akkermansia muciniphila and improved metabolic health during a dietary intervention in obesity: relationship with gut microbiome richness and ecology

    Get PDF
    Objective: Individuals with obesity and type 2 diabetes differ from lean and healthy individuals in their abundance of certain gut microbial species and microbial gene richness. Abundance of Akkermansia muciniphila, a mucin-degrading bacterium, has been inversely associated with bodyfat mass and glucose intolerance in mice, but more evidence is needed in humans. The impact of diet and weight loss on this bacterial species is unknown. Our objective was to evaluate the association between fecal A. muciniphila abundance, fecal microbiome gene richness, diet, host characteristics, and their changes after calorie restriction (CR). Design: The intervention consisted of a 6-week CR period followed by a 6-week weight stabilization (WS) diet in overweight and obese adults (N=49, including 41 women). Fecal A. muciniphila abundance, fecal microbial gene richness, diet and bioclinical parameters were measured at baseline and after CR and WS. Results: At baseline A. muciniphila was inversely related to fasting glucose, waist-to-hip ratio, and subcutaneous adipocyte diameter. Subjects with higher gene richness and A. muciniphila abundance exhibited the healthiest metabolic status, particularly in fasting plasma glucose, plasma triglycerides and body fat distribution. Individuals with higher baseline A. muciniphila displayed greater improvement in insulin sensitivity markers and other clinical parameters after CR. A. muciniphila was associated with microbial species known to be related to health. Conclusion: A. muciniphila is associated with a healthier metabolic status and better clinicaloutcomes after CR in overweight/obese adults, however the interaction between gut microbiota ecology and A. muciniphila has to be taken into account
    corecore